Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR - Datasets] Cast tensor extension type to opaque object dtype in .to_pandas(), .to_dask(), etc. #29417

Merged

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Oct 17, 2022

Cast tensor columns in Pandas views of Datasets to an opaque object dtype, to match the semantics of our batch UDFs and iterators.

Related issue number

Closes #29490

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow marked this pull request as ready for review October 17, 2022 22:32
@clarkzinzow clarkzinzow force-pushed the datasets/fix/to-pandas-tensor-type branch from 5c2a6cf to 599dc0a Compare October 20, 2022 02:02
@clarkzinzow clarkzinzow force-pushed the datasets/fix/to-pandas-tensor-type branch from 599dc0a to 55d47d9 Compare October 20, 2022 23:26
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides one question.

python/ray/data/dataset.py Outdated Show resolved Hide resolved
@clarkzinzow
Copy link
Contributor Author

CI looks good, merging!

@clarkzinzow clarkzinzow merged commit 2d4ccb4 into ray-project:master Oct 24, 2022
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
… `.to_pandas()`, `.to_dask()`, etc. (ray-project#29417)

Cast tensor columns in Pandas views of Datasets to an opaque object dtype, to match the semantics of our batch UDFs and iterators.

Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Datasets] No tensor column casting for ds.to_pandas(), ds.to_dask(), etc.
3 participants